Lecture 7: Linear models

Varada Kolhatkar

Announcements

  • Where to find slides?
    • https://kvarada.github.io/cpsc330-slides/lecture.html
  • HW3 is due next week Tuesday, Oct 1st, 11:59 pm.
    • You can work in pairs for this assignment.

Recap: Dealing with text features

  • Preprocessing text to fit into machine learning models using text vectorization.
  • Bag of words representation

Recap: sklearn CountVectorizer

  • Use scikit-learn’s CountVectorizer to encode text data
  • CountVectorizer: Transforms text into a matrix of token counts
  • Important parameters:
    • max_features: Control the number of features used in the model
    • max_df, min_df: Control document frequency thresholds
    • ngram_range: Defines the range of n-grams to be extracted
    • stop_words: Enables the removal of common words that are typically uninformative in most applications, such as “and”, “the”, etc.

Recap: Incorporating text features in a machine learning pipeline

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline

text_pipeline = make_pipeline(
    CountVectorizer(),
    SVC()
)

Class demo

(iClicker) Exercise 6.2

iClicker cloud join link: https://join.iclicker.com/VYFJ

Select all of the following statements which are TRUE.

    1. handle_unknown="ignore" would treat all unknown categories equally.
    1. As you increase the value for max_features hyperparameter of CountVectorizer the training score is likely to go up.
    1. Suppose you are encoding text data using CountVectorizer. If you encounter a word in the validation or the test split that’s not available in the training data, we’ll get an error.
    1. In the code below, inside cross_validate, each fold might have slightly different number of features (columns) in the fold.
pipe = (CountVectorizer(), SVC())
cross_validate(pipe, X_train, y_train)

Linear models

  • Linear models make an assumption that the relationship between X and y is linear.
  • In this case, with only one feature, our model is a straight line.
  • What do we need to represent a line?
    • Slope (\(w_1\)): Determines the angle of the line.
    • Y-intercept (\(w_0\)): Where the line crosses the y-axis.

  • Making predictions
    • \(\hat{y} = w_1 \times \text{# hours studied} + w_0\)

Ridge vs. LinearRegression

Logistic Regression

  • Suppose your target is binary: pass or fail

Sigmoid

Parametric vs. non-Parametric models

(iClicker) Exercise 7.1

iClicker cloud join link: https://join.iclicker.com/VYFJ

Select all of the following statements which are TRUE.

    1. Increasing the hyperparameter alpha of Ridge is likely to decrease model complexity.
    1. Ridge can be used with datasets that have multiple features.
    1. With Ridge, we learn one coefficient per training example.
    1. If you train a linear regression model on a 2-dimensional problem (2 features), the model will learn 3 parameters: one for each feature and one for the bias term.

(iClicker) Exercise 7.2

iClicker cloud join link: https://join.iclicker.com/VYFJ

Select all of the following statements which are TRUE.

    1. Increasing logistic regression’s C hyperparameter increases model complexity.
    1. The raw output score can be used to calculate the probability score for a given prediction.
    1. For linear classifier trained on \(d\) features, the decision boundary is a \(d-1\)-dimensional hyperparlane.
    1. A linear model is likely to be uncertain about the data points close to the decision boundary.

Class demo